{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NumPy Assignment\n",
    "\n",
    "You need to install `numpy` and `matplotlib` by `pip` first.\n",
    "\n",
    "```bash\n",
    "pip install numpy matplotlib\n",
    "```\n",
    "\n",
    "Then import them. Click Shift + Enter in each cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "print(np.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Array Creation\n",
    "\n",
    "**Q1**: List plot the linear, quadratic, and cubic functions. The x-axis ranges from 0 to 1.\n",
    "\n",
    "Hint: To get smooth curves, you need to generate 101 points (with equal distance) along x-axis and y-axis. For example, for linear function, the points are (0,0), (0.01,0.01), ..., (1,1)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = # your code\n",
    "y_linear = # your code (use broadcasting)\n",
    "y_quad   = # your code (use broadcasting)\n",
    "y_cubic  = # your code (use broadcasting)\n",
    "\n",
    "# You need not modify the following codes\n",
    "plt.plot(x,y_linear,label=\"Linear\")\n",
    "plt.plot(x,y_quad,label=\"Quadratic\")\n",
    "plt.plot(x,y_cubic,label=\"Cubic\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Activation Functions\n",
    "\n",
    "[Activation functions](https://en.wikipedia.org/wiki/Activation_function) are important for neural networks to change linear inputs to nonlinear space as to fit complex patterns, which will be covered in the next few courses.\n",
    "\n",
    "**Q2**: In this part, you need to write several commonly used activation functions, including:\n",
    "* sigmoid: $g(x)=\\sigma(x)=\\dfrac{1}{1+\\mathrm{e}^{-x}}$\n",
    "\n",
    "* tanh: $g(x)=\\mathop{tanh}(x)=\\dfrac{\\mathrm{e}^x-\\mathrm{e}^{-x}}{\\mathrm{e}^x+\\mathrm{e}^{-x}}$\n",
    "\n",
    "* ReLU (Recitified Linear Unit): $g(x)=\\begin{cases}0 & x<0\\\\ x & x\\geq 0\\end{cases}$\n",
    "\n",
    "* Leaky RELU: $g(x)=\\begin{cases}\\alpha x & x<0\\\\ x & x\\geq 0\\end{cases}$\n",
    "\n",
    "Notice: `x` is a vector/matrix, and you should perform the operations **element-wise**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = np.random.random((10,)) - 0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Hint: Use np.copy(...) to deepcopy the array\n",
    "\n",
    "def sigmoid(x):\n",
    "    # your code\n",
    "    pass\n",
    "\n",
    "def tanh(x):\n",
    "    # your code\n",
    "    pass\n",
    "\n",
    "def relu(x):\n",
    "    # your code (do not use if-else)\n",
    "    pass\n",
    "\n",
    "def leaky_relu(alpha,x):\n",
    "    # your code (do not use if-else)\n",
    "    pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(z)\n",
    "print(sigmoid(z))\n",
    "print(tanh(z))\n",
    "print(relu(z))\n",
    "print(leaky_relu(z))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Least Squares Problem (Linear Regression)\n",
    "\n",
    "We try to use gradient descent to solve the least squares problem.\n",
    "\n",
    "Say you have 100 points $(x_i,y_i)$ on the plain, and you need to find a linear function\n",
    "\n",
    "$$f(x)=y=Ax + b$$\n",
    "\n",
    "to best fit these points, where $A$ and $b$ are the parameters need to be solved.\n",
    "\n",
    "To get these points, execute the cell below. To simplify the problem, we only consider 2 dimensions here. (But after you learn matrix calculus, you should come back)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 100\n",
    "x = 5 * np.random.random((N,)) # in vector form!\n",
    "y = 5 + 3 * x + np.random.normal(0,2,(N,)) # add Gaussian noise to data\n",
    "plt.plot(x,y,\".\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then how to define \"best fit\"? The easiest way is to use the mean square error (MSE). If the sum of the vertical distances from these points to the line is small, we say this line best fits these points.\n",
    "\n",
    "Therefore, we obtain the target of this problem:\n",
    "\n",
    "$$\\min\\frac{1}{2N}\\sum_{i=1}^N\\|f(x_i) - y_i\\|_2^2=\\min_{A,b}\\frac{1}{2N}\\sum_{i=1}^N(A x_i + b - y_i)^2$$\n",
    "\n",
    "You can simply view the L2-norm $\\|\\cdot\\|_2$ as the Euclidean distance of two points."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Analytical Solution\n",
    "\n",
    "Based on what you learn in calculus class, you can obtain the gradient of the target function (or loss function)\n",
    "\n",
    "$$L(A,b) = \\frac{1}{2N}\\sum_{i=1}^N(A x_i + b - y_i)^2$$\n",
    "\n",
    "**Q3**: Solve\n",
    "\n",
    "$\\begin{cases}\n",
    "\\partial L/\\partial A=0\\\\\n",
    "\\partial L/\\partial b=0\n",
    "\\end{cases}$\n",
    "\n",
    "yourself and you can obtain the explicit/close-form solution, denoted as $\\hat{A}$ and $\\hat{b}$.\n",
    "Based on the formula you solve, write corresponding code below.\n",
    "\n",
    "Hint: Check if you get the correct derivatives, see [here](https://en.wikipedia.org/wiki/Ordinary_least_squares)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A_hat = # your code\n",
    "b_hat = # your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check if your line correctly passes across the points"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_hat = A_hat * x + b_hat\n",
    "print(\"A_hat: {:.4f}\".format(float(A_hat)))\n",
    "print(\"b_hat: {:.4f}\".format(float(b_hat)))\n",
    "plt.plot(x,y,\".\")\n",
    "plt.plot(x,y_hat)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gradient Descent\n",
    "\n",
    "The linear least square problem is so easy that we can directly solve for the close-form solutioin. However, for most of the optimization problems (including neural networks), we are hard to get the exact form of the solutions. Thus, we need to do some approximation, and gradually approach the best one through iteration. This is where **gradient descent** comes from.\n",
    "\n",
    "Once you obtain the gradient of the loss function, you can use gradient descent to do optimization. Suppose you need to minimize the MSE function $L(A,b)$ above, what you need to do is randomly initialize a starting point, follow the gradient indicator (which implies the shape of the function), and go downward for a small step. At this new point, you check again for the gradient direction, and go down, so on and so forth. Finally, you can get to the valley of the function, where all the other points around you are higher than you, and the iteration stops.\n",
    "\n",
    "![gradient descent](https://cdn-images-1.medium.com/max/1000/0*0ZJ6i6DWqhx-SL5N.gif)\n",
    "\n",
    "Intuitively, you can the process of gradient descent as a ball rolling down the mountain.\n",
    "\n",
    "The basic gradient descent can be formulated as\n",
    "\n",
    "$$\\theta^{(k+1)}=\\theta^{(k)}-\\alpha\\nabla_\\theta L$$\n",
    "\n",
    "where $\\theta$ is the parameter needed to be optimized, and $\\alpha$ is the step size or learning rate. In our case, it becomes\n",
    "\n",
    "$$\\begin{cases}\n",
    "A^{(k+1)}=A^{(k)}-\\alpha\\frac{\\partial L(A,b)}{\\partial A}\\\\\n",
    "b^{(k+1)}=b^{(k)}-\\alpha\\frac{\\partial L(A,b)}{\\partial b}\n",
    "\\end{cases}$$\n",
    "\n",
    "**Q4**: Based on the formulas, implement gradient descent below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cal_loss(x,y,A_hat,b_hat):\n",
    "    \"\"\"\n",
    "    Helper function for calculating MSE loss\n",
    "    \"\"\"\n",
    "    return np.mean((A_hat * x + b_hat - y) ** 2)\n",
    "\n",
    "def gradient_descent(x,y,learning_rate=0.01,iterations=100):\n",
    "    \"\"\"\n",
    "    Gradient descent implementation\n",
    "    \"\"\"\n",
    "    n = len(y)\n",
    "    A_hat = np.random.random(1)\n",
    "    b_hat = np.random.random(1)\n",
    "    cost_history = np.zeros(iterations)\n",
    "\n",
    "    for k in range(iterations):\n",
    "        \n",
    "        # Implement gradient descent here\n",
    "        A_hat_new = # ...\n",
    "        b_hat_new = # ...\n",
    "        \n",
    "        cost_history[k] = cal_loss(x,y,A_hat_new,b_hat_new)\n",
    "        A_hat = A_hat_new\n",
    "        b_hat = b_hat_new\n",
    "        \n",
    "    return A_hat, b_hat, cost_history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can visualize your results below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A_hat_grad, b_hat_grad, cost_history = gradient_descent(x,y)\n",
    "print(\"A_hat: {:.4f}\".format(float(A_hat_grad)))\n",
    "print(\"b_hat: {:.4f}\".format(float(b_hat_grad)))\n",
    "plt.figure(figsize=(10,4))\n",
    "# Loss function\n",
    "plt.subplot(121)\n",
    "plt.plot(cost_history)\n",
    "plt.xlabel(\"Iterations\")\n",
    "plt.ylabel(\"Loss\")\n",
    "plt.title(\"Loss function\")\n",
    "# Fit result\n",
    "plt.subplot(122)\n",
    "plt.plot(x,y,\".\")\n",
    "plt.plot(x,A_hat_grad * x + b_hat_grad)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"y\")\n",
    "plt.title(\"Linear fit\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
